Efficient Record De-Duplication Identifying Using Febrl Framework

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient way of Record Linkage System and Deduplication using Indexing techniques, Classification and FEBRL Framework

Record linkage is an important process in data integration, which is used in merging, matching and duplicate removal from several databases that refer to the same entities. Deduplication is the process of removing duplicate records in a single database. In recent years, data cleaning and standardization becomes an important process in data mining task. Due to complexity of today’s database, fin...

متن کامل

A Bayesian Approach to Graphical Record Linkage and De-duplication

We propose an unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation involves the representation of the pattern of links between records as a bipartite graph, in which records are directly linked to latent true individuals, and only indirectly linked to other records. This flexible representation...

متن کامل

ViDeDup: An Application-Aware Framework for Video De-duplication

Key to the compression-capability of a data deduplication system is the definition of redundancy. Traditionally, two data items are considered redundant if their underlying bit-streams are identical. However, this notion of redundancy is too strict for many applications. For example, for a video storage platform, two videos encoded in different formats would be unique at the system level but re...

متن کامل

An Efficient Algorithm for De-duplication of Demographic Data

This paper proposes an efficient algorithm to de-duplicate based on demographic information which contains two name strings, viz. GivenName and Surname, of individuals. The algorithm consists of two stagesenrolment and de-duplication. In both stages, all name strings are reduced to generic name strings with the help of phonetic based reduction rules. Thus there may be several name strings havin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IOSR Journal of Computer Engineering

سال: 2013

ISSN: 2278-8727,2278-0661

DOI: 10.9790/0661-01022227